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ABSTRACT 

We present photometric redshift estimates for galaxies used in the weak lensing anal¬ 
ysis of the Dark Energy Survey Science Verification (DES SV) data. Four model- or 
machine learning-based photometric redshift methods - ANNZ2, BPZ calibrated against 
BCC-Ufig simulations, SKYNET, and TPZ - are analysed. For training, calibration, and 
testing of these methods, we construct a catalogue of spectroscopically confirmed 
galaxies matched against DES SV data. The performance of the methods is evalu¬ 
ated against the matched spectroscopic catalogue, focusing on metrics relevant for 
weak lensing analyses, with additional validation against COSMOS photo-zs. From 
the galaxies in the DES SV shear catalogue, which have mean redshift 0.72 ± 0.01 
over the range 0.3 < 2 < 1.3, we construct three tomographic bins with means of 
2 = {0.45,0.67,1.00}. These bins each have systematic uncertainties Sz < 0.05 in the 
mean of the fiducial SKYNET photo-z n(z). We propagate the errors in the redshift dis¬ 
tributions through to their impact on cosmological parameters estimated with cosmic 
shear, and find that they cause shifts in the value of erg of approx. 3%. This shift is 
within the one sigma statistical errors on cr 8 for the DES SV shear catalog. We further 
study the potential impact of systematic differences on the critical surface density, 
E cr it, finding levels of bias safely less than the statistical power of DES SV data. We 
recommend a final Gaussian prior for the photo-z bias in the mean of n(z) of width 
0.05 for each of the three tomographic bins, and show that this is a sufficient bias 
model for the corresponding cosmology analysis. 

Key words: cosmology: distance scale - galaxies: distances and redshifts - galaxies: 
statistics - large scale structure of Universe - gravitational lensing: weak 
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1 INTRODUCTION 

One of the key goals of the Dark Energy Survey (DES) is 
to extract cosmological information from measurements of 
weak gravitational lensing. Gravitational lensing (for discus¬ 
sion see Narayan & Bartelmann 1996; Refregier 2003; Mun- 
shi et al. 2008, and references therein) involves the deflection 
of light from distant galaxies by intervening matter along the 
line-of-sight. Lensing encodes information in the shapes of 
background objects ( i.e galaxies) on both the statistical 
properties of intervening matter perturbations and cosmo¬ 
logical distances to the sources. The primary challenge in 
studying gravitational lensing in the weak regime has been 
the difficulty in measuring the shapes of galaxies in an unbi¬ 
ased way. For a detailed discussion of galaxy shape measure¬ 
ments in DES SV, see Jarvis et al. (2015). However, a weak 
lensing analysis requires not only the careful measurement 
of the shapes of galaxies, but also an accurate and unbiased 
estimate of redshifts to a large ensemble of galaxies. 

Knowing the redshifts of the galaxies in a sample (or 
equivalently, their distances for a given cosmological model), 
allows us to differentiate near and distant galaxies and 
thereby reconstruct the redshift-dependence of the lensing 
signal. Hence separating galaxies into redshift bins strongly 
improves the constraining power of cosmic shear on cosmo¬ 
logical model parameters (Hu 1999). Extensive studies have 
been reported in the literature that look for optimal con¬ 
figurations of redshift binning and requirements for future 
ambitious surveys, covering several thousand square degrees, 
(Amara & Refregier 2007; Banerji et al. 2008; Cai et al. 2009; 
Sun et al. 2009; Bernstein & Huterer 2010; Abdalla et al. 
2008; Bellagamba et al. 2012; Cunha et al. 2012; Bordoloi 
et al. 2010, 2012; Sheldon et al. 2012; Cunha et al. 2014). In 
addition to gains in statistical precision, separating galax¬ 
ies into tomographic bins can also mitigate astrophysical 
systematics. For example, moving to a tomographic anal¬ 
ysis allows us to better isolate the intrinsic correlations of 
galaxy shapes in the absence of lensing (see Troxel & Ishak 
(2015); Kirk et al. (2015) and references therein), whereas a 
non-tomographic analysis may otherwise be limited by un¬ 
certainties in the impact of this intrinsic galaxy alignment 
(for more, see DES et al. 2015). 

Given the large number of galaxies that make up a lens¬ 
ing sample in a wide field imaging survey, redshifts must be 
estimated using photometry measured in a series of (typi¬ 
cally) broad bands. This method of estimating photometric 
redshifts is known as photo-z (see Hildebrandt et al. 2010, 
and discussion and references therein). Achieving the high 
level of precision necessary to ensure that the systematic 
contributions to cosmological parameter uncertainties due 
to photo-z bias are of the order of the statistical uncertainty 
is challenging, as is the necessary validation of the derived 
redshifts (Mandelbaum et al. 2008; Hildebrandt et al. 2012; 
Benjamin et al. 2013; Schmidt & Thorman 2013; Banerji 
et al. 2015; Sanchez et al. 2014). Previous weak lensing sur¬ 
veys have tackled this problem in a variety of innovative 
ways. For example, see Hildebrandt et al. (2012) and Ben¬ 
jamin et al. (2013) for the discussion of this problem in the 
CFHTLenS survey (Erben et al. 2013; Heymans et al. 2013) 
and Schmidt & Thorman (2013) in the Deep Lens Survey 
(Wittman et al. 2002). Substantial and dedicated efforts are 
required to improve current performance and achieve the 


target precision in on-going and future surveys. The chal¬ 
lenging target set for the full Dark Energy Survey is that 
the biases in redshift estimates of the means of tomographic 
bins should be below 5z = 0.003, which is based on the de¬ 
sire to keep redshift systematic errors subdominant to the 
statistical errors of the lensing surveys (Amara & Refregier 
2007; Abdalla et al. 2008). 

In this work we explore accurate and precisely char¬ 
acterised photo-z estimates of n(z), the result of stacking 
the individual probability distribution functions p(z), with 
the Science Verification (SV) data of DES. At 139 square 
degrees, the required precision for DES SV weak lensing 
analyses are significantly weaker than those for the full DES 
survey data. As such we target precision at the few percent 
level for the mean redshifts of a given population of galaxies. 
This will allow us to have photo-z uncertainties comparable 
to or lower than the statistical errors on the cosmological 
parameters we are best able to constrain ( e.g as). We can 
study the impact of redshift precision directly by propagat¬ 
ing the expected photo-z bias to the constraints on as, but 
also by comparing the differences in final predictions for as 
over the full DES SV shear catalogue from each of four dif¬ 
ferent independent photometric redshift methods. 

The paper is organised as follows. In section 2 we in¬ 
troduce the data products that are used in our studies. In 
sections 3 and 4 we investigate the global properties of the 
lensing sample including magnitude, colour and redshift dis¬ 
tributions. We also discuss the limitations of existing spec¬ 
troscopic samples. In section 5 we extend our analysis to 
tomographic cases and the impact on cosmological parame¬ 
ters is explored in section 6. Our conclusions are summarised 
in section 7. 


2 DATA SETS 

Prior to the start of the main Dark Energy Survey, the Dark 
Energy Camera (DECam) (Flaugher et al. 2012; Diehl 2012; 
Honscheid et al. 2012; Flaugher et al. 2015), with a hexag¬ 
onal footprint of 570 Megapixels, was tested during a pre¬ 
liminary Science Verification (SV) survey from November 
2012 to February 2013. These observations produced a use- 
able DES SV galaxy catalogue with which measurement and 
analysis pipelines have been tested to produce early science 
results. The DES SV survey mimics full 5-year DES survey 
parameters over a small patch of the sky, but with significant 
depth variations due to weather and other challenges during 
early operations of DECam (see e.g., Leistedt et al. 2015). 
The contiguous area used for the DES SV shear catalogue 
is contained within the South Pole Telescope east (SPT-E) 
observing region (Carlstrom et al. 2011), and covers approx¬ 
imately 139 square degrees in five optical filters, g, r, i, z, 
and Y. We note that the Y band was not used in this work. 

In this section we present the DES SV data products 
relevant for photometric redshift estimation. We also build 
a catalogue of precise and reliable spectroscopic redshifts by 
collating a number of proprietary and public spectroscopic 
datasets that also have DES photometric observations avail¬ 
able. This is essential to test the methods for photo-z esti¬ 
mates used in this work. Finally, we describe a set of sim¬ 
ulations of the DES SV survey that we use as a secondary 
method of calibrating and validating the photo-z estimates. 


MNRAS 000, 000-000 (0000) 


Redshifts of DES Science Verification weak lensing galaxies 3 


2.1 DES SV Photometry and Gold Catalogue 

DES data from the SV season were reduced by the SVA1 
version of the DES Data Management system (Mohr et al. 
2012), using SCAMP (Bertin 2006), SWarp (Bertin et al. 
2002) and bespoke software packages, as described in Sevilla 
et al. (2011); Desai et al. (2012) and Mohr et al. (2012). 
To summarise, the single-epoch images were calibrated, 
background-subtracted, coadded, and processed in ‘tiles’ 
(0.75 x 0.75 deg 2 squares) defined to cover the entire DES 
footprint. A catalogue of objects was extracted from the 
coadded images using Source Extractor (SExtractor, 
Bertin & Arnouts 1996; Bertin 2011). In what follows we use 
AB magnitudes and MAG_AUTO measurements performed in 
coadd images, which are reliable for SV galaxies ( e.g., ro¬ 
bust to sharp PSF variations across coadd images) and used 
in most SV analyses (e.g., Crocce et al. 2015). However, 
note that shape measurements are performed in single-epoch 
images with a dedicated pipeline using multi-epoch fitting 
techniques, as described in Jarvis et al. (2015). The analysis 
presented in this work will be concerned with the objects 
that meet the quality cuts of that pipeline. 

The main catalogue of reliable objects in DES SV is 
the Gold catalogue described in Rykoff et al. (2015). It 
starts with all objects detected in SV images and succes¬ 
sively applies quality cuts to reject objects and regions that 
are deemed problematic (e.g., regions with poor observa¬ 
tions or photometry). To be included in the Gold catalogue, 
a galaxy must: 

■ be observed at least once in all four griz bands, 

■ be at a declination above —61° to avoid regions of bad 
photometric calibration (e.g., Large Magellanic Cloud) 

• not be in regions with galaxy surface density > 3<r below 
the mean. 

■ not be in regions surrounding bright stars. 

• not be in regions with a concentration of large centroid 
shifts or dropouts between bandpasses. 

Further information on star-galaxy separation and quality 
cuts at the shape measurement level are described in detail 
in Jarvis et al. (2015). 

2.2 DES SV Shear Catalogue 

Two semi-independent shear pipelines - im3shape and NG- 
mix - have been produced for a subset of objects in the DES 
SV Gold Catalogue in the SPT-E region of the sky. These 
are described further in Jarvis et al. (2015), but relevant 
details are summarised below. The two shear pipelines pro¬ 
duce separate shear measurements for each galaxy, and thus 
select a different subset of the galaxies in the Gold Cata¬ 
log as having well-measured shears. This leads to a different 
population of galaxies used by either pipeline in construct¬ 
ing the n(z) for each tomographic bin in a weak lensing 
analysis, though the im3shape selection is nearly a subset 
of the ngmix selection. The final shear catalogue is the in¬ 
tersection of the gold galaxy selection, these shear-related 
cuts, and a final ’good’ galaxy selection for lensing that re¬ 
moves objects with SExtractor flags = 1, 2, very lower 
surface brightness objects, very small objects, or those with 
colours outside reasonable bounds (—1 < g — r < 4 and 
— l<i — z<4). These selection effects also produce slightly 
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i magnitude 


Figure 1. i-band magnitude histograms for various levels of cuts 
from the full Gold catalogue down to the final shear catalogue. 

different photometric properties in the galaxy sample used. 
This is demonstrated in Fig. 1, where the i-mag histogram is 
compared for all ‘Gold’ objects, all galaxies, ‘good’ galaxies, 
as defined above, and finally the two shear selections. 

• im3shape The im3shape shear measurement pipeline 
is built on the im3shape code discussed in Zuntz et al. 
(2013) and modified as described in Jarvis et al. (2015). The 
im3shape code is a forward-modelling maximum-likelihood 
method that fits two galaxy models to an image in the r 
band: an exponential disc and a de Vaucouleurs bulge. The 
best-fitting model is then used to estimate the ellipticity. In¬ 
verse variance weights are calculated for each galaxy empir¬ 
ically in bins of size and signal-to-noise. The final im3shape 
shear catalogue has a number density of ~4.2 galaxies per 
square arcminute. 

• ngmix The ngmix shear measurement pipeline repre¬ 
sents simple galaxy models as the sum of Gaussians (Hogg 
& Lang 2013). The same model shape is fit simultaneously 
across the riz bands, with parameters sampled via Markov 
Chain Monte Carlo (MCMC) techniques. Ellipticities are 
then estimated using the lensfit algorithm (Miller et al. 
2007) with priors on the intrinsic ellipticity distribution from 
Great3 (Mandelbaum et al. 2014). Inverse variance weights 
are calculated for each galaxy from the covariance of the 
shape estimate and an intrinsic shape noise estimate. The 
final ngmix shear catalogue has a number density of ~6.9 
galaxies per square arcminute. 

Throughout this work we use the ngmix catalogue as the 
default weak lensing sample unless explicitly stated other¬ 
wise. 

2.3 Spectroscopic Catalogues 

To train and assess the performance of the photometric red- 
shifts we assemble a matched catalogue of galaxies that are 
observed with both DECam and a spectrograph. In this sec¬ 
tion we describe the photometric and spectroscopic proper¬ 
ties of this matched catalogue. Objects are matched on the 
sky within a matching radius of 1.5 arcseconds. The spectra 
used come from 6 distinct areas on the sky and contain a 
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Figure 2. Location of the six spectral fields and the main DES 
SV (SPT-East) field on the sky. The SN fields are the DES super¬ 
nova fields while the other two have been observed with DECam 
outside of the DES survey. 


Spectroscopic survey 

Count 

Mean i 

Mean z 

VIPERS 

7286 

21.52 

0.69 

GAMA 

7276 

18.61 

0.22 

Zcosmos 

5442 

20.93 

0.51 

VVDS F02 Deep 

4381 

22.40 

0.68 

SDSS 

4140 

18.82 

0.39 

ACES 

3677 

21.73 

0.58 

VVDS F14 

3603 

20.61 

0.49 

OzDES 

3573 

19.85 

0.47 

ELG cosmos 

1278 

22.22 

1.08 

SNLS 

857 

21.09 

0.55 

UDS VIMOS 

774 

22.54 

0.85 

2dFGRS 

725 

17.52 

0.13 

ATLAS 

722 

18.96 

0.35 

VVDS spFlO WIDE 

661 

21.16 

0.53 

VVDS CDFS DEEP 

544 

22.05 

0.62 

UDS FORS2 

311 

23.80 

1.25 

PanSTARRS MMT 

297 

19.94 

0.35 

VVDS Ultra DEEP 

264 

23.71 

0.88 

PanSTARRS AAOmega 

239 

19.69 

0.32 

SNLS AAOmega 

81 

21.16 

0.56 


Table 1. The number of galaxies that are included in the matched 
spectroscopic catalogue are listed for each spectroscopic survey 
with the corresponding mean redshift and mean i band magni¬ 
tude. Further details can be found in appendix A. 


total of 46139 galaxies. The distributions of these fields on 
the sky relative to the main DES SV SPT-E field are shown 
in Fig. 2. In Table 1 the general properties of the spectro¬ 
scopic surveys used in this matched catalogue are listed, but 
for a more detailed description of the properties (e.g., the 
quality flags used), we refer the reader to Appendix A. 

The final matched spectroscopic catalogue has been 
cleaned of objects that we do not expect to be present 
in the shear catalogue. This includes removing all stars, 
strong lenses, and AGN. The matching is limited to the 
(0 < z < 1.8) redshift range. This means that for all the 
machine learning (ML) methods used in this work the den¬ 
sity of n(z) above z = 1.8 will be zero, though model fitting 
codes do not have this drawback. We test that artificially 
cutting the n(z ) at 1.8 for a model fitting code biases the 
constraints on as at the 1% level, which is sufficiently small 
relative to the statistical error (see Sec. 6 for more details). 



n(z) spec training + validation 



Redshift (z) 


Figure 3. The normalised redshift distributions of the spectro¬ 
scopic samples used in producing and testing the photometric 
redshift estimates. The solid line is the Kernel Density Estimate 
(KDE) (Ivezic et al. 2014) estimate of the underlying density. 
Top panel: The combined training and validation samples. Mid¬ 
dle panel: The independent sample (VVDS-F14). Bottom panel: 
The VVDS-Deep sample. 

We divide the resulting matched spectroscopic cata¬ 
logue into three samples: a training, a validation, and an 
independent sample, which are compared in Fig. 3. The in¬ 
dependent sample contains all the matched galaxies from 
VVDS-F14 field; a total of 3,603 galaxies. This field is spa¬ 
tially removed from the other spectroscopic fields, as shown 
in Fig. 2, and therefore the line of sight structure within 
this field is uncorrelated with that of training and valida¬ 
tion sets. The use of this field will allow us to assess issues 
pertaining to sample variance and radial learning in the ma¬ 
chine learning methods (e.g., App. D). If the redshift solu¬ 
tion is overtrained or subject to systematic incompleteness, 
any performance metrics on a validation set with a near 
identical redshift distribution to the training sample would 
be too optimistic. In App. D, we demonstrate an example 
of extreme selection effects in a training set based on the 
PRIMUS survey, while in Sec. 3.3 we study the complete¬ 
ness of the training set used in this work. The remaining 
42,536 galaxies in the matched spectroscopic catalogue are 
split into the training and validation samples containing, re¬ 
spectively, 70% and 30% of the galaxies. This retains a total 
of 28,219 galaxies in the training sample and 14,317 galaxies 
in the validation sample. 

2.4 COSMOS Data 

In addition to spectroscopic data from the literature, we also 
make use of the point-estimated photometric redshifts from 
Ilbert et al. (2009) in the COSMOS field. These photo-z esti¬ 
mates were computed from 30-band photometry with the Le 
Phare template fitting photometric redshift code (Arnouts 
et al. 1999). The COSMOS field was observed with DECam 
during the SV observing period and coadd images with a 
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similar total exposure time as the SV survey have been pro¬ 
duced. We match the catalogue extracted from these images 
to the COSMOS photo -2 sample, and trim to a subsample 
representative of the shear catalogue. This trimming was 
performed by applying cuts in the i-band FWHM - magni¬ 
tude plane as follows: 

FWHM (arcsec) > 0.105 x i (mag) — 1 
FWHM (arcsec) > 0.751 x i (mag) — 15.63 
i > 18 (mag) 

together with a surface brightness cut at 
/x e ff < 28 mag arcsec -2 . These cuts approximate the 

final shape catalogue selection function and allow us a 
further independent estimate of the redshift distribution of 
the weak lensing sample. 

2.5 Simulated SV data: the BCC-UFig 

In the following sections we will calibrate a model based 
photo-z method using a set of galaxy catalogues extracted 
from simulated SV data: the BCC-UFig (Chang et al. 2014). 
The latter is based on simulated DES coadd images created 
using the Ultra-Fast Image Simulator (UFig, Berge et al. 
2013). The input galaxy catalogues for these images were 
taken from the Blind Cosmology Challenge (BCC, Busha 
et al. 2013). The galaxy catalogues were then obtained by 
running source extraction and processing codes to mimic 
the pipeline run on the real DES SV data, as described in 
Chang et al. (2014) and Leistedt et al. (2015). The BCC- 
UFig was shown to reliably mimic the SV data in terms of 
colour, redshift, and spatial distributions of the objects, and 
also reproduce systematics observed in the reduced galaxy 
catalogues such as spatially varying depth and correlations 
with observing conditions (Chang et al. 2014; Leistedt et al. 
2015). In this paper we push the comparison further and 
consider catalogues similar to the weak lensing catalogue 
described above by making the same catalogue-level cuts as 
are used for the COSMOS data. 


3 PROPERTIES OF MATCHED 

SPECTROSCOPIC CATALOGUE AND 

TEMPLATES 

Ideally, we would be able to compile a sample of spectroscop¬ 
ically identified objects that are fully representative of our 
target weak lensing galaxy population. If these spectroscopic 
objects were sufficiently numerous and well-sampled over the 
sky, then the redshift distribution of these objects could be 
used in conjunction with weak lensing measurements to in¬ 
fer constraints on cosmological parameters. However, even in 
large samples such as the one compiled for this work, biases 
remain due to spectroscopic incompleteness and difficulties 
in representing all galaxies in the face of spatially varying 
data quality. 

In this section we investigate to what extent our existing 
spectroscopic sample should reflect the underlying redshift 
distribution of our photometric sample and assess the ef¬ 
fectiveness of weighting spectroscopic objects in correcting 
for differences between the photometric and spectroscopic 


galaxy populations. We pay special attention to possible bi¬ 
ases in the inferred probability distribution of the weak lens¬ 
ing sources due to these limitations. Note that while mod¬ 
elling methods do not require representative training sam¬ 
ples, biases may still arise if the model templates are not 
a sufficiently accurate description of the data. This is ana¬ 
logue to model bias in cosmic shear measurements (Voigt 
& Bridle 2010; Kacprzak et al. 2014). As in cosmic shear, 
we can aim to tackle these issues through simulations of the 
data. Thus Secs. 3.1-3.3 address challenges related to ma¬ 
chine learning methods, while Sec. 3.4 discusses challenges 
to using template fitting methods. 

3.1 Noise properties of the matched catalogue 

A large fraction of the DES-SV galaxies that have spectra 
lie in the DES supernovae fields or other fields with a sig¬ 
nificantly longer cumulative exposure time than the SPT-E 
field, which contains the galaxies used for the weak lensing 
science. We show in Fig. 4 the estimated 10a MAG_AUTO de¬ 
tection limits of the matched spectroscopic catalogue com¬ 
pared to that of the weak lensing sample. The 10a detec¬ 
tion limits differ significantly between the samples, with the 
galaxies in the matched spectroscopic catalogue having sig¬ 
nificantly deeper detection limits on average. This poses 
a problem for ML methods as they do not explicitly take 
the noise measurement into account. The ML methods in 
this work implicitly assume the noise properties from the 
matched spectroscopic catalogue to be identical to those of 
the weak lensing sample. 

One way to obtain a similar depth distribution in the 
spectroscopic set is to create co-added images of the deeper 
fields using a subset of exposures with numbers similar to 
those typical in the SPT-E field, as was used in Sanchez 
et al. (2014). A second option is to algorithmically degrade 
the photometry of the matched spectroscopic catalogue for 
the bands of the galaxies with higher S/N. This is done in 
the following manner: 

(i) For every galaxy in the matched spectroscopic cat¬ 
alogue, we find its nearest neighbour in four-dimensional 
colour-magnitude space from the weak lensing sample. 

(ii) If one or more bands of the matched galaxy have a 
fainter 10a detection limit than the weak lensing sample 
detection limit in those bands, then a new magnitude is 
drawn. 

(iii) This new magnitude is determined according to a 
normal distribution using the measured magnitude of the 
spectroscopic galaxy as the mean and the error on the mag¬ 
nitude of the selected neighbour in the weak lensing sample 
for the variance. 

The limits in image depth (10a detection) for which we 
decide to re-draw a new magnitude value are: MAG-AUTO 
g = 24.5, MAG-AUTO r = 24.3, MAG_AUTO i = 23.5, and 
MAG-AUTO 2 = 22.8. So, for a galaxy in the matched spec¬ 
troscopic catalogue that has a 10a detection of 24.7 in i band 
and a 10a detection of 22.5 in 2 band we draw a new i band 
magnitude and keep the original 2 band magnitude. 

This leads to a matched spectroscopic catalogue that 
has approximately the same noise properties as the weak 
lensing sample. The method has some advantages over re¬ 
stacking, one of which is that we can degrade to any other 
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Figure 4. The 10c MAG-AUTO detection limits of the matched spectroscopic sample (blue) compared to that of the weak lensing sample 
(red). The matched spectroscopic catalogue has a significantly larger detection limit due to the fact that many DES galaxies with spectra 
lie in the frequently observed DES supernova fields. 
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Figure 5. The i-band magnitude distribution of the matched 
spectroscopic catalogue in shown in blue and the weak lensing 
sample is shown in red. The matched spectroscopic catalogue after 
weighting is shown as the grey histogram outline overlaying the 
weak lensing sample. 

noise level as long as the original exposures are of sufficient 
depth. This is not necessarily possible with re-stacking due 
to the fact that observing conditions sampled during point¬ 
ings in SPT-E cannot be recreated with those observed in 
the deeper fields. To protect against potential biases intro¬ 
duced by this procedure, the training and validation in this 
work have been algorithmically degraded while the indepen¬ 
dent field containing all the VVDS-F14 galaxies is created by 
re-stacking and is identical to the reduction of the field used 
in Sanchez et al. (2014). We validated that using restacked 
cooads instead of resampling the magnitudes has no signifi- 
cat effect on our results. 


3.2 Weighting of the spectroscopic set 

In the work presented here we characterise the impact of 
errors in redshift estimation on weak lensing studies. Our 
focus is thus on the galaxy samples selected based on our 
ability to measure accurately their shapes in DES SV. Figure 
5 shows the i-band magnitude distribution of the matched 
spectroscopic catalogue in blue and the distribution of the 
weak lensing sample from DES SV in red. The difference in 
magnitude of the samples is very clear, with the matched 
spectroscopic sample biased to brighter magnitudes. We ac¬ 
count for differences in magnitude and colour by weighting 
galaxies in the spectroscopic sample in such a way that the 
weighted distribution of training galaxies matches the weak 
lensing source distribution. This can then be used in per¬ 


formance metrics to give a better indication of the likely 
errors coming from averaging over the weak lensing popula¬ 
tion. The weights we use are calculated as in Sanchez et al. 
(2014) by estimating the density of objects in the matched 
spectroscopic sample in colour-magnitude space noted be¬ 
low, with all objects detected in all bands, and: 

— 1 < g — r < 4 

— 1 < r — i < 4 

— 1 < i — z < 4 

16 < i 

16 < r. 

We then compare this density with the density of the weak 
lensing sample at the same location in colour-magnitude 
space, using the NGMix catalogue. The ratio of the densi¬ 
ties of the weak lensing sample to the matched spectroscopic 
catalogue at the location of a spectroscopic galaxy in colour- 
magnitude space is calculated by counting the number of 
galaxies in the weak lensing sample in a hypersphere with 
radius to the 5 th nearest neighbour in Euclidian space in 
the matched spectroscopic catalogue. The normalised ratio 
of these densities are then used as weights for the spectro¬ 
scopic galaxies (see Lima et al. 2008 for more details on the 
implementation). 

Fig. 5 shows the weighted i-band distribution for the 
spectroscopic sample, which better matches the ngmix cat¬ 
alogue. In Fig. 6, we show g — r, r — i, and i ~ z for the 
matched spectroscopic catalogue and weak lensing sample 
on the top row while we show g — r vs r — i, r — ivsg — i and 
i — z vs r — z in the bottom row. The weighted colours of the 
matched spectroscopic catalogue are a good match to those 
of the weak lensing sample, although we can see in middle 
panel of the bottom row that the tails of the colour distri¬ 
butions of the weak lensing sample are not as well approx¬ 
imated. This is due to the fact the matched spectroscopic 
catalogue only has ~40,000 galaxies while the weak lensing 
sample has more than 3,000,000, hence the tails of the dis¬ 
tributions of the weak lensing sample are poorly sampled by 
the limited amount of objects in the matched spectroscopic 
catalogue. 

We find that 1.6% of the weak lensing sample fall out¬ 
side the range of colours sampled by our spectroscopic cat¬ 
alogues. It is relatively straightforward to remove these re¬ 
gions, but the results in this work are robust to the inclusion 
or exclusion of these 1.6% galaxies. 
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Figure 6. The colour distribution of the weighted matched spectroscopic catalogue is shown in blue relative to the weak lensing sample 
in red. Top row: ID histograms of the three colours: g — r, r — i, and i — z. Bottom row: Related 2D comparisons of the colour distributions. 
In general the weighted matched spectroscopic catalogue colour distribution matches the weak lensing sample colour space well, although 
in the bottom row we can see that weighted matched spectroscopic catalogue is unable to match the tails of the weak lensing sample. 


3.3 Assessing the weighted spectroscopic sample 

The weighting procedure assumes that small regions of 
colour-magnitude space (pixels) populated by galaxies in the 
weak lensing sample are fairly sampled in the matched spec¬ 
troscopic catalogue. If this is the case, then weighted esti¬ 
mates of performance metrics will be equivalent to those ob¬ 
tained from a complete spectroscopic sample (i.e., one with¬ 
out biases due to a selection function or incompleteness). 
However, it is possible that some galaxies live in colour- 
magnitude regions where incompleteness could lead to miss¬ 
ing populations from the spectroscopic sample. The redshifts 
of the spectroscopic sample in these regions could then be 
biased relative to the full sample of DES galaxies that lie in 
the same regions of colour space. 

The only sizeable sample that we have access to with 
target selection of comparable depth to DES is the VVDS 
Deep survey. This sub-survey within VVDS targeted galax¬ 
ies purely on i-band magnitude at i < 24. In order to un¬ 
derstand how the incompleteness within this survey corre¬ 
sponds to the colour and brightness of the galaxy distri¬ 
bution we break the four-dimensional colour-magnitude vol¬ 
ume of the weak lensing sample (g — r, r — i, i — z and i-band 
magnitude) into cells based on a k-means clustering algo¬ 
rithm (Ivezic et al. 2014). Each cell represents approximately 
0.2% of the sample. To each of these k-means cells we assign 
objects from our weighted spectroscopic and COSMOS pho¬ 
tometric redshift samples and objects targeted by the VVDS 
Deep survey. Within each four-dimensional k-means cell we 
find the fraction of the VVDS Deep targets that was suc¬ 
cessfully assigned a high confidence redshift (flag 3, 4, 9, 13, 
14 or 19). In Fig. 7 we show the number of VVDS Deep tar¬ 
gets and success rate (completeness) in colour-colour space 
for three ranges in i-band magnitude. Between them, these 
magnitude ranges cover the peak of the number counts in 
the shear catalogue. 

At relatively bright magnitudes (i < 22.5) the overall 
completeness is relatively high, but even here there are typ¬ 
ically 20% or more of the targeted galaxies that we do not 
know the redshifts for. If the incompleteness is due to the 
clear spectral features of the remaining 20% falling outside 
of the spectroscopic window then it is easy to imagine that 
the weighted redshift distribution representing this region of 


colour-magnitude space would be biased. At fainter magni¬ 
tudes the incompleteness increases, first for the reddest ob¬ 
jects, but eventually at i > 23, the majority of subsamples 
are less than 50% complete. We cannot remove weak lensing 
galaxies in all of the incomplete cells without discarding the 
majority of our sample. Instead, we try to estimate the likely 
impact of this incompleteness and in particular whether the 
uncertainties on the inferred means are consistent with the 
rest of the uncertainties that we estimate in this work. 

In order to estimate the possible impact of incomplete¬ 
ness on the mean redshift of the population we split the 
colour space cells shown in Fig. 7 into regions we term 
‘good’ and ‘bad’. The regions are divided at a complete¬ 
ness of 65%, which is the median value of the complete¬ 
ness in the cells. We then compare the mean redshift of the 
weighted spectroscopic sample to the mean from the photo¬ 
metric redshift catalogue published by Ilbert et al. (2009) in 
the COSMOS field, ensuring we use the matched cuts from 
section 2.4. Due to the fact that the spectroscopic sample 
contains many more bright objects than faint, only one quar¬ 
ter of the ~ 40,000 spectroscopic objects are contained in 
‘bad’ cells. We find the difference in the means of the ’good’ 
sample is Sz = 0.013, while 5z = 0.03 for the ‘bad’ regions. 
These errors are comparable to the expected Poisson errors 
(which alone should be at the level 0.01) and sample vari¬ 
ance (at the level of 0.03), which for a COSMOS sized survey 
dominates over Poisson errors for samples with more than 
1000 galaxies (see Appendix A of Bordoloi et al. 2010). For 
the sample as a whole we therefore do not find evidence for 
biases in the mean at the level of precision allowed by the 
samples available. 

Later, in section 6, we will see that lensing measure¬ 
ments tend to be dominated by galaxies at higher redshifts. 
These in turn tend to come from regions with lower levels 
of completeness. To study this briefly we repeat the com¬ 
parison between the weighted spectroscopic estimates and 
the COSMOS samples by first selecting galaxies from the 
highest redshift bin that we study later (0.83 < z < 1.3, 
see Sec. 5). We find differences in means of 0.015 and 0.05 
for the good and bad regions respectively. The samples for 
this study are significantly smaller. The good regions have 
624 and 4255 galaxies in the spectroscopic and COSMOS 
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22 <i <22.5 



0 12 3 


g—r 

22.5 <i <23 



0 12 3 


g—r 

23 <i <23.5 



0 12 3 


g-r 

Figure 7. Spectroscopic completeness of the VVDS Deep sample 
in g — r vs r — i colour space. Each point represents the centre of a 
4-D colour-magnitude k-means cell containing a similar number 
of galaxies from the DES SV NGMIX catalogue. The size of the 
point represents the number of targeted objects, while the colour 
indicates the fraction that returned a reliable redshift. The three 
magnitude ranges (as labelled) cover the i-band magnitude range 
that contains the majority of galaxies in the weak lensing sample 
— see Fig. 1 for the distribution in the catalogues. 


samples, respectively, and the difference in their means can 
be explained by Poisson errors alone. The bad regions have 
1507 and 17322 galaxies and so the difference in the mean be¬ 
tween the spectroscopic and COSMOS determinations can¬ 
not be fully explained by Poisson errors alone. However, like 
the full sample considered above, the difference is similar to 
that expected from sample variance. We thus conclude that 



g-r 



g-r 


Figure 8. Upper panel: Colour-space distribution of weak lens¬ 
ing sample galaxies and the matched sample taken from BCC- 
UFig in logarithmic number density intervals (red and blue con¬ 
tours respectively). Over-plotted are the observer-frame colours of 
redshift-evolved galaxy templates (black lines). Here we show the 
default set of templates included in the BPZ photometric redshift 
code, restricted to 0.3 < z < 1.3 for clarity. Lower panel: The 
weak lensing and BCC-UFig samples are restricted to objects 
with BPZ-derived mean redshifts in the range 0.35 < z < 0.45. 
The bold light blue sections of the template tracks indicate the 
same redshift interval for the galaxy models. 

any errors coming from incompleteness for the studies used 
in this paper are likely to be below the 5% level. 

3.4 Biases due to template colour coverage 

An alternative approach to estimating redshifts empirically 
based on spectroscopic training samples ( e.g via a ML tech¬ 
nique) is to use a set of galaxy templates to fit for galaxy red¬ 
shifts. By capturing the rest-frame properties of galaxy spec¬ 
tral energy distributions (SEDs), this modelling approach 
has the advantage that it can be used to interpolate over 
regions where there are gaps in spectroscopic samples and 
to extend to higher redshifts. However, as with all modelling 
approaches there is a risk of introducing model biases if the 
templates used for the fitting are not fully representative of 
true galaxies. 

In this work we have focused on the bpz template set 
of Benitez (2000); Coe et al. (2006) 1 which like many tem¬ 
plates, are built for z = 0 galaxies. These do not explicitly 

1 Multiple template-fitting codes and template sets were used in 
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account for evolution of the red sequence or changing dust 
properties at high-z. The upper panel of Fig. 8 shows the 
colour space distribution of the weak lensing galaxy sam¬ 
ple (red contours) and the matched BCC-UFig sample (blue 
dashed contours) compared with the observed-frame colours 
of the bpz templates redshifted over the range 0.3 < z < 1.3. 

Sharp and strong features in galaxy SEDs, such as the 
4000A break, create an outer envelope of template colours 
in certain colour-colour projections. Of particular impor¬ 
tance to this work is where the 4000A break transitions 
between the g and r DECam filters, resulting in extrema 
in the colours of many templates at 2 ~ 0.4. The effect of 
this is shown in the lower panel of Fig. 8, where red bold 
sections of the template tracks correspond to the redshift in¬ 
terval 0.35 < z < 0.45. There is clearly a fairly large region 
of colour-colour space, to the bottom-right of the envelope 
sampled by the template set, for which the closest template 
will be at z ~ 0.4 (in this projection at least). We plot con¬ 
tours for the weak lensing and BCC-UFig samples that lie 
within this same 0.35 < 2 < 0.45 range, showing that indeed 
the vast majority of galaxies in this region have a redshift so¬ 
lution at z ~ 0.4. Previous efforts have in part circumvented 
this problem, even when using the same template set, by the 
addition of further photometric bands, in particular u-band. 
Expanding the wavelength coverage with additional bands 
reduces the reliance on single informative colours for red¬ 
shift determination. In this way, potential bias introduced 
from template-fitting is reduced. For the DES SV data the 
u-band is not observed, but in section 4 we show how we use 
the BCC-UFig simulations to correct to first order for this 
effect due to the templates colour coverage. 


4 GLOBAL PHOTO-Z BEHAVIOUR AND 
PERFORMANCE 

Given the inherent challenges and potential biases in es¬ 
timating redshifts, we have implemented a number of in¬ 
dependent methods for estimating the redshift distribution 
of the DES SV shear catalogue. Beginning with the global 
galaxy distribution, we adopt three approaches. The first is 
an empirical approach based on machine learning methods 
using spectroscopic training. The second approach is model- 
based and uses a combination of galaxy templates and cal¬ 
ibration using the BCC-UFig simulations. Finally we also 
estimate the galaxy distribution by matching to COSMOS 
photo-z data. Agreement between the results can give us 
confidence that possible systematic errors are subdominant, 
and the level of discrepancy gives an indication of the level 
of uncertainty that propagate through to later cosmological 
constraints. 

• Empirical spectroscopic: Several machine learning 
photo -2 methods have been explored within the DES col¬ 
laboration, some of which have been previously described 
in Sanchez et al. (2014). In the work that follows we focus 
on a subset of these methods, namely annz2, skynet and 
tpz, which are described in more detail in Appendix C. We 


the preparation of this work, though we present a single choice 
for brevity. 


note that tpz and skynet do not use the weights in train¬ 
ing while ANNZ2 calculates its own weights that it uses in 
training. 

• Modelling: For the model-based approach we have im¬ 
plemented the template based method bpz. We construct the 
prior as described in Benitez (2000) by fitting to the train¬ 
ing sample of the weighted matched spectroscopic catalogue. 
Using the same prior presented in Sanchez et al. (2014) has 
little impact on the results. To calibrate this method we em¬ 
ploy a simple first-order correction by applying weak lensing 
selection cuts to the BCC-UFig catalogues (see section 2.5) 
and measuring the offset of the mean redshift between these 
galaxies and that estimated from the pure BPZ n(z). We find 
this offset to be 0.050.“ This offset is applied as a shift to all 
the bpz results below, i.e., n(z) — > n(z — <52), unless stated 
otherwise, and is designed to counteract, to leading order, 
the effect of the peak at z ~ 0.4 due the template coverage 
issues (see section 3.4) that is present in both the SV data 
and simulations. 

• Empirical photometric: The COSMOS field has 
been observed using DECam and processed through the 
DES Data Management pipeline to produce coadd images of 
similar depth to the main SV survey field. Galaxies detected 
in these images are matched to the Ilbert et al. (2009) photo- 
2 catalogues and then cuts designed to replicate weak lensing 
selection are applied, as outlined in section 2.4. Though the 
photo -2 estimates for the COSMOS galaxies are far better 
than those we can derive from the 5 DES bands, this ap¬ 
proach is limited by sample variance. 

For all the results presented in the sections that follow, we 
retain 0.3 < zs kyNe t < 1-3 galaxies only. Redshifts of galaxies 
outside this range are both poorly estimated and have very 
little impact on the lensing measurements. Galaxies at low 
redshift have little lensing signal and there are so few at 
higher redshift that they can be dropped from the analysis. 
The redshift cuts are made using the skynet mean, since 
we have baselined this method as our default, but results 
that we present are robust to this choice. 

The lower panel of Fig. 9 shows our reconstruction of 
the n(z) for the DES SV weak lensing sample. The yellow 
curve comes from the weighted validation set spectra, which 
is in effect also an estimator of the global distribution. We 
also show the results of the three machine learning methods, 
the modelling based method using bpz and BCC-UFig and 
the matched COSMOS results. The vertical lines in the plot 
show the means of the distributions, which are also listed in 
Table 2. We focus on the mean since it is well known that 
uncertainty in the mean is the Erst order cause of systematic 
errors in weak lensing (Amara & Refregier 2007). Later, in 
section 6, we will propagate the full errors through to weak 
lensing statistics and ag. We see that all of our estimates of 
the global distribution of galaxies give comparable results 
and we estimate the mean to be 0.72 with a precision bet¬ 
ter than 0.02. As a further test, we also show results when 
we apply the same procedure to the unweighted validation 

2 Though the BCC-UFig sample is colour matched to the weak 
lensing sample after performing the initial weak lensing cuts, this 
does not influence the correction. If we do not colour-match the 
BCC-UFig sample to the weak lensing sample, we find an offset 
of 0.049. 
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!htb 


Validation Set 

0-3 < z phot_SkyNet < 1-3 



Red shift ( 2 ) 


Weak Lensing Sample (NGMIX) 

0-3 < z phot_SkyNet < 1-3 



Figure 9. The full redshift distribution n(z) for the validation 
sample (0.3 < z < 1.3). Upper panel: The kernel density esti¬ 
mate of the full unweighted validation sample compared to the 
four photo -2 methods. Lower panel: The same, but including the 
weighting from Sec. 3.2 and matched COSMOS photometric red- 
shifts from Ilbert et al. (2009). The vertical lines in the plots are 
the mean values of the distributions. 



DES SV -WL sample 

Validation sample 

Spectra 

0.72 (weighted) 

0.64 

annz2 

0.73 

0.65 

SKYNET 

0.73 

0.65 

TPZ 

0.73 

0.64 

BPZ 

0.71 

0.64 

Matched COSMOS 

0.70 

- 


Table 2. The left column contains the estimates of the mean 
of redshift distribution of the NGMIX sample of the four photo-z 
methods and also the mean of the weighted spectroscopic sample 
which is itself an estimate of the mean of the NGMIX sample. The 
right column contains the mean of the unweighted validation set 
with the four photo-z methods and the mean from the spectra. 


sample. Here we take the spectroscopic sample to be a truth 
catalogue and we can see again that our methods are able 
to find the mean of this distribution to a precision better 
than 0.01. The corresponding means for these results are 
also shown in Table 2. 


5 TOMOGRAPHIC PHOTO-Z 

PERFORMANCE 

In the previous section, we discussed the global character¬ 
istics of the estimated n(z). In the cosmological analysis of 
DES et al. (2015), we have presented a conservative analysis 
of the two-point cosmic shear constraints on cosmology by 
marginalising over a large array of nuisance parameters re¬ 
lated to known or suspected systematics. Particularly in the 
case of intrinsic alignment, doing so severely degrades the 
constraining power of a non-tomographic analysis. Thus we 
must also characterise how well the four photo -2 methods 
are able to reconstruct the redshift distribution of individ¬ 
ual tomographic bins - in this case, three bins selected that 
match those used in Becker et al. (2015); DES et al. (2015). 
These are designed to contain approximately equal lensing 
weight in the larger NGMIX shear catalogue. The bin bound¬ 
aries are set by cuts on the skynet mean redshifts at [0.3, 
0.55, 0.83, 1.3]. We choose to keep the galaxies in each bin 
fixed according to the cosmology analysis of DES et al. 2015. 

In this section we look at the photo -2 performance in 
these three tomographic bins. This is done through a series 
of tests, comparing the reconstruction of n(z ) (and in par¬ 
ticular the value of the mean redshift) in three spectroscopic 
galaxy samples and the NGMIX catalogue: 

• Test 1: An independent sample of spectroscopic galax¬ 
ies in the VVDS-F14 field, which were not used in training or 
validation and located in a distinct part of the sky separate 
from the training and validation fields. The radial structure 
in the independent sample is thus different from what the 
machine learning methods trained on. 

• Test 2a: A deeper spectroscopic sample of 30% of the 
galaxies in the VVDS-Deep field, which matches better to 
the depth of DES SV photometry, but which is also part of 
the validation sample and thus not fully independent. 

• Test 2b: The full validation sample - 30% of the 
matched spectroscopic sample set - excluding galaxies in 
the VVDS-F14 field. 

• Test 3: Comparison of the redshift estimates of the four 
photo -2 methods for the full DES SV NGMIX catalogue. 

Once again, we use skynet as the fiducial photo -2 re¬ 
sult, and so for consistency all objects in this section are 
assigned a bin based on the mean of the skynet p(z). In Ap¬ 
pendix B, we show results where each code assigns a bin to 
each galaxy based on their own z-mean. Figures 10 show the 
results in the tomographic bins of tests 1, 2a and 2b for each 
of the photo -2 algorithms we consider as labelled. Overall we 
see that all the methods produce consistent results. Since we 
do not have a perfectly representative spectroscopic sample 
for the galaxy population for the full NGMIX catalogue, we 
only compare the relative agreement of the photo -2 methods 
in the bottom panel of Fig. 10. The bin with the highest cos¬ 
mological information content for tomographic lensing is the 
highest redshift bin. It is therefore reassuring that visually 
the different methods give consistent results. Table 3 shows 
the mean offsets of the results shown in the top 3 panels 
of Fig. 10. Table 4 shows the estimates of the mean in the 
tomographic bins of the NGMIX sample by the photo-z codes 
and the estimate of the weighted spectroscopic sample. We 
see from the results for Tests 2b and 3, which are the clos¬ 
est to our weak lensing samples, that the relative bias of the 
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z range 

annz2 

BPZ 

SKYNET 

TPZ 


0.30- 

0.55 

0.014 

0.001 

0.003 

0.008 

Test 1 

0.55 - 

0.83 

0.019 

-0.002 

0.017 

0.017 


0.83- 

1.30 

0.033 

0.057 

0.063 

0.039 


0.30- 

0.55 

0.139 

0.072 

0.027 

0.079 

Test 2a 

0.55 - 

0.83 

0.069 

0.027 

0.034 

0.042 


0.83 - 

1.30 

0.002 

-0.026 

0.044 

0.016 


0.30- 

0.55 

0.064 

0.032 

0.012 

0.033 

Test 2b 

0.55 - 

0.83 

0.027 

-0.010 

0.013 

0.010 


0.83- 

1.30 

-0.030 

-0.045 

0.022 

-0.016 


Table 3. The bias ((z p hot) — (zspec)) between the photometric 
redshift estimates and the true spectroscopic distribution in Test 
1 (‘independent’), Test 2a (VVDS-Deep) and Test 2b (Full vali¬ 
dation set). 


z range 

Spec 

(weighted) 

annz2 

BPZ 

SKYNET 

TPZ 

0.30 - 0.55 

0.45 

0.49 

0.46 

0.45 

0.46 

0.55 - 0.83 

0.67 

0.69 

0.64 

0.67 

0.67 

0.83 - 1.30 

1.00 

0.98 

0.97 

1.02 

1.01 


Table 4. The estimated mean of the three tomographic bins in 
the NGMIX sample of the four photo-z methods and the estimate 
of the weighted spectroscopic sample. 

means are broadly consistent with Gaussian scatter of width 
0.05. 


6 IMPLICATIONS FOR WEAK LENSING 

The mapping of traditional photo-z metrics to actual im¬ 
pacts on the weak lensing measurements and cosmological 
parameter constraints is non-trivial, and the resulting bias 
can be difficult to capture using simple metrics. In this sec¬ 
tion we explore the impact of photo-z uncertainty by propa¬ 
gating the errors through the two-point correlation function 
and to the cosmological parameter erg and to measurements 


6.1 Photo-z impact on two-point cosmic shear 
analysis 

The photo-z n{z) impacts the predicted correlation function 
(and thus constraints on cosmological parameters) through 
the lensing efficiency when modelling the convergence power 
spectrum C{1). The tomographic correlation function £+/- 
is related to C{t) through the zeroth (fourth) order Bessel 
function of the first kind by 

~ f duCijWJwieo), (i) 

where (i, j ) £ (1,2, 3) represent the redshift bins in the auto- 
or cross-correlation. Cij(£) is then defined as 


for comoving distance y, horizon distance \h , matter power 
spectrum Ps, and lensing efficiency, given in a flat universe 
as 

T r,, N f XH Xs~Xi ,n\ 

Wi(xi)= 2 (1 + Zi)xi dXsn(Xs) -• (3) 

zc j xi Xs 

The redshift distribution of galaxies is normalised such that 
f rii(x)dx = 1, Hq is the Hubble parameter, and Q m is the 
matter density parameter at z = 0. 

The predicted £+/- (both tomographic and non- 
tomographic) are calculated over the 9 range and tomo¬ 
graphic binning used for the measurements in Becker et al. 
(2015) for each photo-z estimate and the weighted matched 
spectroscopic sample. We then use these predicted correla¬ 
tion functions with the covariance matrix from Becker et al. 
(2015) to propagate the differences between photo-z esti¬ 
mates through to constraints on as (with all other param¬ 
eters fixed). The ‘truth’ (or measurement of £+/- with no 
systematic uncertainties) is taken to be either the fiducial 
SKYNET prediction in Sec. 6.1.1 or the weighted matched 
spectroscopic sample in Sec. 6.1.2, while each photo-z esti¬ 
mate’s predicted £+/- is taken to be the assumed theory in 
turn when constraining os. The final results of this compar¬ 
ison for the four photo-z estimates presented in this work 
are shown in Figs. 11 - 13. 


6.1.1 Comparison of photo-z estimates for the DES SV 
shear catalogue 

For the full photometric galaxy sample contained within the 
shear catalogue, we have no estimate for the true value of the 
n(z) to compare to and so instead compare to the fiducial 
SKYNET prediction as a relative point of reference. We can 
therefore only compare the relative agreement between the 
photo-z codes shown for the ngmix catalogue in Fig. 11. 

In the left panel, the relative agreement in the magni¬ 
tude of is shown, averaged over 9. s The left set of points 
show the non-tomographic £+, while the middle and right 
sets of points show the three auto- and cross-correlations, re¬ 
spectively. The grey bands show the 1 <t error on the magni¬ 
tude of the measured £+ for each correlation function, using 
the covariance calculated in Becker et al. (2015). The rela¬ 
tive agreement in £ + between the machine learning methods 
is very good in correlations with the highest tomographic bin 
(‘33’, ‘23’, and ‘13’). This increases significantly for correla¬ 
tions with the lower tomographic bins (‘11’, ‘22, and ‘12’), 
though the non-tomographic case also has good agreement 
on the order of 5%. bpz tends to disagree with the machine 
learning methods, typically at the 5-10% level. 

The right panels of Fig. 11 show the corresponding 
constraints on as. The SKYNET prediction is normalised to 
one (vertical dotted black line). The likelihood histogram, 
coloured to match the points in the left panel for each photo- 
z code, is shown for the full tomographic constraint, while 
the vertical solid black line gives the peak of the likelihood 
histogram for the non-tomographic constraint. The bias in 
constraints on <T8 between the machine learning photo-z 




L 


X 2 X 


( 2 ) 


3 The major results are unchanged when instead considering spe¬ 
cific values of 6. 
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Test-1 VVDS-F14 

0-3 <z p hot <0.55 n spec ~ 1460 0.55 <z phot <0.83 n spec = 1246 0.83 <z phot <1.3 n spec =250 



Red shift (z) 


0-3 <z phot <0.55 n spec = 565 


Test-2a VVDS-Deep Validation 

0-55 <z phot <0.83 n spec =845 


0.83 <z phot <1.3 n S pec ~812 



Red shift (z) 


Test-2b Full Validation 

0-3 <z phot <0.55 n spec =4150 0.55 <z phot <0.83 n tpec =4221 0.83 <z phat <1.3 n spec = 2324 



Redshift (z) 


Tomographic bins, NGMIX sample 

0-3 <z pho t <0.55 0.55 <z phot <0.83 0.83 <z phal <1.3 



Redshift (z) 


Figure 10. Each row of panels show the weighted spectroscopic redshift distributions (shaded area) of the objects in each tomographic 
bin as selected by the mean of SKYNET compared to estimates of the redshift distribution of the four methods used in this work. Top 
row: The spectra used in this test comes from VVDS-F14, an independent sample not not used for training. Second row: The spectra 
used in this test are a 30% subset of VVDS-Deep used as part of the validation sample. Third row: The spectra used in this test are a 
30% subset of the matched spectroscopic catalogue used for validation. Bottom row: The redshift distribution in the tomographic bins 
for the NGMIX sample. 


methods is very small despite low-z differences in the corre¬ 
lation function, with agreement at much better than the la- 
level. bpz has a relative bias of about la-, by comparison, 
which corresponds to about 3% in erg. 

For completeness, we have also repeated the above anal¬ 
yses and those in Sec. 6.1.2 on the im3shape n(z) with the 
same redshift boundaries matching those derived for NGMIX 


and again for tomographic bins derived for im3siiape, and 
find in all cases that the major conclusions and resulting 
differences across photo -2 methods are consistent between 
analyses of the two catalogues at the level of accuracy we 
require for SV analysis. 
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Figure 11. A comparison of the relative agreement of the n(z) 
estimates for annz2, bpz, skynet, and tpz for the ngmix shear 
catalogue. Left panel: The relative magnitude of the correlation 
function compared to the spectroscopic n(z) prediction is shown 
for the non-tomographic £_|_, the three auto-correlations, and the 
three cross-correlations. The grey band is the actual variance in 
the magnitude of £+ measured from SV data. Right panels: The 
corresponding constraints on erg , with fiducial skynet results nor¬ 
malised to one (vertical dotted black line). The likelihood his¬ 
tograms, colour-coded to match the £_|_ points on the left, are 
shown for each tomographic constraint. The peak of the likeli¬ 
hood histogram for the non-tomographic constraint is given by 
the vertical black line for comparison. The vertical ordering is 
the same as the legend in the left panel. 


6.1.2 Null tests relative to matched spectroscopic samples 

One difficulty with the results in Sec. 6.1.1 is that we have 
no way of determining what the true n(z) is, and thus can 
only compare relative agreement between photo-z methods. 
We can, however, create an experiment in which the n(z) 
is known to be exactly that of our weighted independent 
spectroscopic sample (Test 1). We then repeat the analysis 
from Sec. 6.1.1 for this test as an additional way of charac¬ 
terising systematic photo-z uncertainties. Though there are 
only 2956 galaxies in the independent spectroscopic sam¬ 
ple within our 0.3 < z < 1.3 boundaries, we assume the 
estimated n(z) from each code and the spec-z distribution 
instead represents a sample with the same number of objects 
as the ngmix catalogue. These redshift distributions (see top 
panel Fig. 10) are used to measure the relative difference in 
£ + /_ compared to the spectroscopic prediction as in Sec. 
6.1.1. We also calculate error bars on the points, which rep¬ 
resent the la error in the difference from bootstrapping the 
n(z) of the sample. Since we are comparing the matched pho¬ 
tometric and spectroscopic n(z) distributions for the same 
galaxies contained within the VVDS-F14 field, there is no 
sample variance contribution to these error bars. However, 
since it is a small field separate from the DES SV SPT-E re¬ 
gion, any extrapolation of the bias to the full DES SV shear 
catalogue could still be over- or under-estimated. 

We show the results of this analysis in Fig. 12. The bias 
in £+ relative to the spectroscopic prediction for the three 
machine learning codes (annz2, skynet, and tpz) is shown 
in the left panel. It is in good agreement and consistent 
across the correlations at about 5 — 10% larger than the 


Figure 12. A comparison of the relative agreement of the n(z) 
estimates for ANNZ2, bpz, skynet, and tpz to the weighted in¬ 
dependent spectroscopic galaxy sample. Left panel: The relative 
magnitude of the correlation function compared to the spectro¬ 
scopic n(z) prediction is shown for the non-tomographic £+, the 
three auto-correlations (11, 12, 33 bin pairs), and the three cross¬ 
correlations (12, 13, 23 bin pairs). The grey band is the actual 
variance in the magnitude of £_|_ measured from SV data. Error 
bars on the points are the 1-cr error on the difference of £+ ob¬ 
tained from bootstrapping the n(z) of the spectroscopic sample. 
Right panels: The corresponding constraints on erg, normalised to 
one (vertical dotted black line). The likelihood histograms, colour- 
coded to match the £+ points on the left, are shown for each to¬ 
mographic constraint. The peak of the likelihood histogram for 
the non-tomographic constraint is given by the vertical black line 
for comparison. The vertical grey band is the corresponding 1-cr 
bootstrap error in erg. 



Figure 13. A comparison of the relative agreement of the n(z) 
estimates for annz2, bpz, skynet, and tpz to the weighted ’deep’ 
spectroscopic galaxy sample, showing the same information as 
described in Fig. 12. 

spectroscopic prediction. This is consistent with the machine 
learning codes producing too wide p(z) or over-estimated 
high-z tails, both of which can bias £+ high. The empirically 
corrected bpz photo-z estimates perform similarly, with a 
maximum bias in £+ of 10% in the highest redshift auto¬ 
correlation. 
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annz2 

BPZ 

SKYNET 

TPZ 

Test 1 

annz2 

BPZ 

SKYNET 

TPZ 

-0.36 (0.04) 
-0.22 (0.02) 
-0.03 (0.01) 

0.36 (-0.04) 

0.13 (-0.01) 
0.33 (-0.03) 

0.22 (-0.02) 
-0.13 (0.01) 

0.2 (-0.01) 

0.03 (-0.01) 
-0.33 (0.03) 
-0.2 (0.01) 

Test 2a 

annz2 

BPZ 

SKYNET 

TPZ 

3.94 (0.1) 
7.02 (0.04) 
5.2 (0.06) 

-3.94 (-0.1) 

3.08 (-0.07) 
1.26 (-0.04) 

-7.02 (-0.04) 
-3.08 (0.07) 

-1.82 (0.02) 

-5.2 (-0.06) 
-1.26 (0.04) 
1.82 (-0.02) 

Test 2a 
Corrected 

annz2 

BPZ 

SKYNET 

TPZ 

0.08 (0.02) 
0.08 (0.02) 
0.04 (0.01) 

-0.08 (-0.02) 

-0.0 (0.0) 
-0.04 (-0.01) 

-0.08 (-0.02) 
0.0 (-0.0) 

-0.04 (-0.01) 

-0.04 (-0.01) 
0.04 (0.01) 
0.04 (0.01) 


Table 5. Values of In K for the Bayes factor K = /V( V )p i )/ Pr(D\p 2 ) are shown for each photo -2 estimate (p\ - rows) compared to 
another (p 2 - columns) when constraining the value of ag (all other cosmology is kept fixed, varying only the estimates of n(z) between 
pi, '[>■>, and D). The values for tomographic (non-tomographic) analyses in Figs. 12, 13, and the right panel of 14 are given. The Bayes 
factor gives an indication of how much more supported one photo -2 estimate (pi) is than another (P 2 ) by the data D, in this case the 
predicted correlation function built from the weighted spectroscopic estimate of n(z). A value In AT > 1 generally indicates that pi is 
more strongly supported as the true photo -2 estimate. 


The right panels of Fig. 12 show the corresponding con¬ 
straints on erg. The weighted spectroscopic prediction is nor¬ 
malised to one (vertical dotted black line) and the vertical 
grey band is the ler bootstrap error corresponding to the er¬ 
ror bars on the £+ points. Note, however, that discussion of 
deviations in as will refer primarily to the marginalised con¬ 
straints unless specifically referring to the bootstrap error. 
The tomographic and non-tomographic constraints agree 
well. All four photo-z estimates are biased slightly low by 
just less than lcr. It is important to note that due to the 
small sample size in the independent spectroscopic test sam¬ 
ple, the la bootstrap error in as just due to sample variance 
in the independent spectroscopic sample is of the same order 
as the lcr constraints on ag in DES SV for some methods. 
Overall, we find a level of systematic bias from this test in 
ag of 1-3%. 

We can further diagnose the performance of the photo -2 
codes’ estimates of the n(z) by considering the Bayes factor 


Pr(D\pi) 
Pr(D\p 2 ) ’ 


(4) 


where Pr is the posterior probability of the model pi due to 
some photo -2 estimate in the as constraints of Fig. 12. In 
this analysis, D refers to the predicted £+/_ for the weighted 
matched spectroscopic samples, and Pr is the integrated 
posterior likelihood. The Bayes factor can be used to com¬ 
pare how well supported by the data two models are. A 
value In K > 1 supports pi over p 2 , with pi being substan¬ 
tially supported when In K > 3. The Bayes factor is given for 
each combination of photo -2 estimates in Table 5. The Bayes 
factors from the tomographic analysis are given first, with 
the non-tomographic Bayes factors shown in parentheses for 
comparison. We find that there is no significant preference 
for one photo -2 code over another for the independent sam¬ 
ple (Test 1), though there is some evidence that ANNZ2 does 
slightly worse and the corrected bpz slightly better. This dis¬ 
tinction is lost, however, for the non-tomographic analysis, 
which is unable to differentiate the photo -2 estimates. 

We also want to compare the photo -2 performance of 
the four codes for a set of spectroscopic redshifts that bet¬ 


ter match the depth of the DES SV data. Figure 13 instead 
compares the correlation function and as constraints for the 
photo -2 estimates of galaxies in the weighted ’deep’ spec¬ 
troscopic sample of Test 2a. The predicted n(z) for these 
galaxies is shown in the third panel in Fig. 10. All four codes 
perform more poorly for this ’deep’ sample compared to the 
analysis of Test 1 in Fig. 12, with a greater spread in the 
magnitude of the predicted £+ relative to the spectroscopic 
prediction. SKYNET is the most stable across tomographic 
bins, with a spread in bias values limited to around 5%. The 
other codes scatter to a much wider range of values. For the 
lower bins in particular, there is significant bias in £+. 

The corresponding ag constraints are driven by infor¬ 
mation in the highest redshift bin, however, and have a more 
reasonable bias about the weighted spectroscopic prediction. 
The four photo -2 estimates still agree with the matched 
spectroscopic prediction for the VVDS-Deep sample within 
1-cr, except for ANNZ2, which is biased at the 2-3cr level. The 
large range of bias between the lowest and highest redshift 
bins also produces a nearly lcr tension between the tomo¬ 
graphic and non-tomographic constraints for ANNZ2. This 
bias is not explained as an artefact of selecting the binning 
of galaxies based on skynet, as seen in App. B. We present 
the associated Bayes factor values in Table 5, where SKYNET 
is significantly favoured over the other three codes. There is 
again no distinction between the codes, however, in the non- 
tomographic analysis from the Bayes factor. Overall, we find 
a maximum level of systematic bias from this test in ag of 7% 
for ANNZ2, though the bias in the other methods is similar 
to the level found in Test 1. 


6.2 Validation of priors for photo -2 bias 
parameters 

To first order, we can correct for the systematic redshift bi¬ 
ases shown in section 6.1 with the approximation rii(z) —» 
m(z — 5zi) where Szi is the bias on the mean redshift of the 
source galaxies in the appropriate tomographic bin. In the 
cosmology analysis of DES et al 2015. we adopt a Gaussian 
prior of width 0.05 on the allowed bias values based on com- 
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Figure 14. The effect on Fig. 13 of applying a bias correction to the mean of the n(z) of each phot. 0-2 estimate by comparison to the 
true spectroscopic n(z). The left side fixes a single bias parameter for the three tomographic bins, while the right side allows a different 
bias parameter for each bin. Each side shows the same information as described in Fig. 12. 


parisons of the four photo -2 method’s estimates of the n(z) 
discussed in Secs. 4 & 5. This is shown explicitly in Fig. 14, 
where we compare the impact such a correction scheme has 
on £+ and as. The bias parameters by which the n(z ) are 
shifted are not marginalised over here, but instead are taken 
from Table 3 for Test 2a, since we can directly calculate the 
bias. 

We find that a single mean redshift bias parameter is 
sufficient to resolve the bias in as for all four codes. Taking 
into consideration the la bootstrap error in the £+ ratio, 
all the tomographic correlations are consistent with zero re¬ 
maining bias in £+ for SKYNET, and the other photo-2 esti¬ 
mates are also greatly improved relative to the spectroscopic 
prediction. Relaxing this to a bias parameter for each red- 
shift bin does not further significantly improve the bias in 
as, but it does have a large impact on the agreement in £+, 
which could have an impact on other parameter constraints. 
All tomographic points are now consistent with zero for the 
machine learning methods. This is confirmed in the Bayes 
factor, shown in Table 5 for the three-parameter case. All 
values of K are consistent with the four corrected photo -2 
estimates being equally likely to be true. 

We thus employ a Gaussian prior on the photo-z bias 
of width 5zi = 0.05, centred at zero, separately for each of 
the tomographic bins in the fiducial cosmology analysis of 
DES et al. 2015. We also explore the effect of propagating a 
non-zero centre for the prior in the analysis discussed in that 
paper, and find no significant differences to the cosmology 
results. 


6.3 Photo-z impact on other lensing analyses 

In general, the main impact of photo-z uncertainties in weak 
lensing measurements enters through the impact on the crit¬ 
ical surface density E cr it- This quantity captures the infor¬ 
mation on distance ratios in lens-source pairs that lensing is 
sensitive to, namely 

^-1 _ 47tG Di a Di 

crit c 2 D„ ’ W 


where Di is the angular diameter distance to the lens, D a is 
the distance to the source, and Di a is the distance between 
lens and source. Calculating E“ it uses the individual p(z ) 
for each galaxy, which is a different test of the photo -2 qual¬ 
ity than the bulk summation into large tomographic bins for 
cosmic shear analysis. It is also possible to directly calculate 
this quantity for a relatively small sample of galaxies, un¬ 
like the correlation function, allowing us to directly compare 
the photo -2 methods’ predictions for this quantity with the 
weighted matched spectroscopic prediction. 

To explore this, we compare the impact of the different 
redshift estimates on the calculation of (E“; t ) as a function 
of lens redshift. This directly probes the impact of photo -2 
bias in measurements of AE in cluster and galaxy-galaxy 
lensing, and is relevant for other tangential shear measure¬ 
ments where one can distinguish between some a population 
with significantly better photo -2 estimates than a source 
sample in the larger shear catalogue. We will assume that 
lens galaxies have negligible redshift error relative to the 
source catalogue and thus have no impact on the calcula¬ 
tion of (E“ ;t }(2iens) for the purpose of evaluating the red¬ 
shift estimates presented in this paper. We follow the same 
process as described above, repeating this analysis for the 
deep matched spectroscopic sample (Test 2a) and for the 
full DES SV ngmix shear catalogue. 

For each galaxy sample and photo -2 estimate, we eval¬ 
uate the weighted mean inverse E cr u as a function of lens 
redshift 


ens) &( ■^sourceji ens ? ^(source, i) ) 5 


( 6 ) 


where f2p( z ) = 1- For the spectroscopic test, (E“ it } sp ec is 
simply evaluated at the spectroscopic redshift with no prob¬ 
ability distribution. We use three source redshift bins: 0.5 < 
•^source < 1.3, 0.7 < •2-source < 1.3, and 0.9 < •^source < 1.3, 
as well as the non-tomographic range from the two-point 
analysis, 0.3 < 2 SO urce < 1-3. We calculate (E“ it ) over the 
lens redshift range 0.1 < 2i ens < 0.9, which brackets the 
redshift limits of the lenses in the Red-sequence Matched- 
filter Galaxy Catalog (RedMaGiC), described in Rozo et 
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Figure 15. The fractional difference in the (£~.,) between the 
photo-z estimates and the deep Test 2a spectroscopic prediction 
is shown as a function of lens redshift for four source redshift bins. 
Grey bands show the I a statistical error in the measurement of 
the tangential shear signal for the three lens bins indicated by the 
width of the bands. 

al. (in prep.), which selects red-sequence galaxies. The cat¬ 
alogue used here in what follows is limited to luminosity 
L > L t , which results in approximately 30,000 lenses. To 
calculate statistical errors for the figures, we use three lens 
redshift bins: 0.2 < «i ens < 0.4, 0.4 < zi ena < 0.6, and 

0.6 < 2lens < 0.8. 

Figure 15 shows the resulting A(E“( t )/(E“? t )spec for 
Test 2a. We find good agreement between the photometric 
estimates and the matched spectroscopic redshifts. SKYNET 
and TPZ have biases that are nearly consistent with zero in 
all bins and lens redshifts, reaching levels comparable to the 
bootstrap errors over the spectroscopic sample at high red¬ 
shift. The worst performing method, bpz, has a bias that 


reaches only 15% at the high redshift limit. For compari¬ 
son, we include the statistical error on the magnitude of the 
tangential shear signal calculated via jackknife of the lens 
sample over the DES SV footprint. The weighted tangential 
shear 7 1 ( 8 ) enters into the calculation of AE linearly with 
E“j t . Except for BPZ, the bias for all methods is typically 
much less than this statistical error. We exclude Test 1 due 
to there being insufficient galaxies in the higher redshift bins 
to produce a ratio that is not dominated by noise, but have 
verified that in the lowest redshift bin, for example, there 
is negligible bias consistent with that shown in Fig. 15 for 
Test 2a. 

We repeat the same analysis for the full DES SV ngmix 
shear catalogue in Fig. 16. The left panel shows (E“; t ) as 
a function of lens redshift for each photo -2 estimate, which 
agree well with each other. The differences are quantified 
in the right panels for each source redshift bin, where the 
fractional difference from the mean is shown. The spread in 
relative differences between the codes is within 5% to that 
seen for the deep Test 2b in Fig. 15, which suggests that the 
bias shown in Fig. 15 is a good estimate of that expected in 
DES SV measurements of (E“j t ). 


7 CONCLUSIONS 

The Dark Energy Survey aims over five years of observa¬ 
tions to combine the measurements of shapes and redshifts 
for hundreds of millions of galaxies to constrain cosmolog¬ 
ical parameters and to study the evolution and structure 
of dark energy and dark matter. The determination of ac¬ 
curate redshift distributions for these galaxies is one of the 
primary challenges for DES and for future weak lensing sur¬ 
veys, and may become the dominant systematic limitation 
in pursuing cosmology through precision weak lensing mea¬ 
surements. We have presented in this work an analysis of the 
resulting redshift distributions of galaxies with shape mea¬ 
surements from the pre-survey Science Verification data for 
DES (DES SV), and identified key challenges and obstacles 
in the pursuit of producing accurate redshift distributions 
for the main DES survey data releases at the level required 
to support ongoing DES weak lensing science. 

We have compiled a set of more than 46,000 spec¬ 
troscopic galaxies, which are matched in image depth and 
weighted to ensure even sampling of the weak lensing sam¬ 
ple. These galaxies are split into training and validation sam¬ 
ples, as well as an independent validation sample and a deep 
validation sample, the latter of which overlaps with the pri¬ 
mary validation sample. The independent sample is taken 
from a separate spectroscopic held (VVDS-F14), while the 
deep sample is closer to the DES SV magnitude distribution. 
These spectroscopic samples are used as part of a larger test 
suite to verify and characterise the performance of the four 
photometric redshift codes compared in this paper: ANNZ2, 
BPZ, SKYNET, and TPZ. 

We identify challenges in producing photometric red¬ 
shifts with the spectroscopic samples available to us and 
DES photometry, including learning the radial profile of the 
spectroscopic distributions in machine learning codes and 
mis-characterisation of the redshift in template-based ap¬ 
proaches due to the limitations of our photometric bands 
and template colour coverage. This can result in artificial 
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Figure 16. Left: (£~ ; .) for the full NGMIX shear catalogue is shown as a function of lens redshift for four source redshift bins: 0.3 < 
Zsource < 1.3 (solid), 0.5 < Zsource < 1.3 (dashed), 0.7 < ^source < 1.3 (dotted), and 0.9 < Zsource < 1.3 (dash-dotted). Right: The 
fractional difference in the (£”?.) between the photo-z estimates relative to the mean. Grey bands show the la statistical error in the 
measurement of the tangential shear signal for the three lens bins indicated by the width of the bands. 


features in the photometric n(z), which will bias any result¬ 
ing analysis that depends on the photometric redshift distri¬ 
bution. We also discuss the challenge of compiling represen¬ 
tative and complete spectroscopic training sets. However, we 
demonstrate that the potential bias in mean redshift due to 
spectroscopic incompleteness does not exceed the expected 
sample variance uncertainty in our presently available sam¬ 
ples due to their small size. 

In order to mitigate the potential issues associated with 
any given photometric redshift approach, we apply three in¬ 
dependent methodologies: the first based on empirical spec¬ 
troscopic data and utilising machine learning techniques; the 
second a modelling-based approach, comprising a template¬ 
fitting routine (bpz) and a first-order correction of the asso¬ 
ciated model biases by image simulations (using BCC and 
UFig); and finally employing highly accurate empirical pho¬ 
tometric redshifts from COSMOS, which have been selected 
to mimic our weak lensing sample. We find the mean red¬ 
shift of the shear catalogue to be z = 0.72. The variance in 
this mean and those of the three tomographic bins are con¬ 
sistent with Gaussian distribution of width 0.05. Therefore 
in the companion cosmology paper (DES et al. 2015), we 
marginalise over the photometric redshift calibration uncer¬ 
tainty using independent Gaussian priors of width 0.05 in 
each photometric redshift bin. 

We propagate these photo-z uncertainties and biases 
through to measurements that are most relevant to weak 
lensing science, which is a necessary step to provide use¬ 
ful characterisations of photo-z biases for DES SV analy¬ 
sis papers. For each of the independent and deep weighted 
spectroscopic validation sets, we compare for each photo-z 
estimate the resulting measures of £+ and the resulting con¬ 
straints on as, as well as resulting measurements of (£“ it ). 
This provides us with direct estimates of expected biases 
on typical weak lensing measurements and cosmological pa¬ 


rameters of interest, and allows us to validate methods of 
marginalising over photo-z biases. 

We find that compared to the weighted spectroscopic 
validation sets, we should expect a level of bias for the fidu¬ 
cial photo-z estimates of less than about 10% in £+, which 
corresponds to a 1-cr deviation or bias of 2 — 3% in as for the 
fiducial skynet method, given DES SV statistical power. 
We verify an approach to mitigate this bias by marginalis¬ 
ing over bias parameters that shift the mean redshift of each 
tomographic bin, demonstrating that this is a sufficient ap¬ 
proach to remove any bias in £+ and as . A similar analysis of 
(E“j t ) finds a bias for the fiducial photo-z estimate that in¬ 
creases to approximately 5% for the highest redshift lenses, 
but which is negligible for most lens redshifts. 

Looking towards the future of the DES and beyond, 
weak lensing-oriented photo-z estimation will face a number 
of challenges. Firstly, in order to remain comparable to the 
expected statistical uncertainties in 5000 deg 2 survey, the 
systematic uncertainties on the mean redshift within a given 
tomographic bin will need to be reduced from <5z ~ 0.05 to 
an eventual level of Sz ~ 0.003. Moreover, extracting the 
greatest amount of the information in the lensing signal will 
require the use of finer tomographic binning. Finally, the 
detailed topology of the p(z) in a given tomographic bin 
will come under increasing scrutiny and marginalising over 
simple redshift bias parameters in the mean is unlikely to be 
sufficient in future cosmoiogy analyses. Our testing metrics 
will need to be expanded to include those more sensitive to 
PDF information on a galaxy-by-galaxy basis (e.g. Bordoloi 
et al. 2010) in order to account for this shift in emphasis. 

The methodologies employed to produce photo-zs can 
be improved upon by exploring better galaxy templates in 
modelling approaches to mitigate problems observed in this 
work, and the incorporation of galaxy information beyond 
magnitude and colour may be key to breaking degeneracies 
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in the machine learning PDFs. Coupled with algorithmic 
improvements is the increasing availability of data. For in¬ 
stance, the year 1 DES survey data cover further key spec¬ 
troscopic fields in Stripe 82, BOSS, DEEP2 and Wigglez. 
Wide field spectroscopic fields, even those biased towards 
the brightest objects, open up new possibilities in the form of 
cross-correlation analyses (Newman 2008). Meanwhile, fur¬ 
ther exquisite photometric fields will also be covered and 
should allow us to conduct comparisons similar to the one 
we performed with COSMOS in this work, but with reduced 
sample variance concerns. Despite these foreseen advances 
in weak lensing photo-z techniques, there still remains the 
separate issue of validating the derived redshifts. To be fully 
confident in both the redshifts and the estimated uncertain¬ 
ties that we find with the various photo-z techniques, the 
need for additional deep, but highly complete, spectroscopy 
is unavoidable. 
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Appendices 

A DETAILS OF MATCHED SPECTROSCOPIC 
SAMPLE 

In this appendix, we note all the quality flags that are used 
in the matched spectroscopic catalogueand their meaning. 

• 2dFGRS: All galaxies with flags 3, 4 or 5, all of these are 
considered to be reliable redshifts (Colless et al. 2001). 

• ACES: All galaxies with flags 3 or 4, these are labeled as 
secure and very secure redshifts (Cooper et al. 2012). 

■ ATLAS: This survey (Mao et al. 2012) has no quality 
flags, all objects classified as galaxies where kept. 

• OzDES: All galaxies with quality flag 4, galaxies with 
this flag are expected to have the correct redsliift more than 
99% of the time (Yuan et al. 2015) 

• ELG Cosmos: All galaxies with quality flags 3 or 4, these 
correspond to clear single line redshift identification and a 
secure redshift respectively (Comparat et al. 2015). 

■ GAMA: All galaxies with quality flag 4, these are labelled 
as certain redshifts (Driver et al. 2011). 

■ PanSTARRS AAOmega: All galaxies with quality flag 3 
or 4, galaxies with these flags are expected to have the cor¬ 
rect redshift more than 95% or 99% of the time, respectively. 
(Rest et al. 2014; Scolnic et al. 2014; Kaiser et al. 2010). 

■ PanSTARRS MMT: All galaxies with quality flag 3 or 4, 
these are labelled as probably and as certain redshifts (Rest 
et al. 2014; Scolnic et al. 2014; Kaiser et al. 2010). 

• SDSS DR10: All galaxies with quality flag 0, this are la¬ 
belled as reliable (Ahn et al. 2014). 

• SNLS AAOmega: All galaxies with quality flag 4 and 5, 
these are labelled as reliable and reliable with more the 3 
clearly visible features (Lidman et al. 2012). 

• SNLS All galaxies with quality flag 1 and 2, these are 
labelled as reliable based on several strong detected features 
and on one clearly detected feature, usually [Oil] (Balland 
et al. 2015). 

■ UDS: All galaxies observed with VIMOS that have qual¬ 
ity flags 3 and 4, these are labelled as secure. All galaxies 
observed with FORS2 that have quality flags A, B or B* 
where A and B is labeled as secure and B* is labeled as re¬ 
liable. See Bradshaw et al. (2013); McLure et al. (2013) for 
more information. 

• VIPERS: All galaxies that have flags 3 and 4, these are 
labeled as reliable (Garilli et al. 2014). 

■ Zcosmos: All galaxies that have flags 3 and 4, these are 
labeled as secure and very secure redshifts (Lilly et al. 2009). 

• VVDS: All galaxies that have flags 3 and 4, these are 
labeled as secure and very secure redshifts (Garilli et al. 
2008; Le Fevre et al. 2004). 


B SELF-SELECTION TOMOGRAPHIC 
ANALYSIS 

We repeat here Tests 1, 2a, and 2b, but now allow each 
code to assign a galaxy to each redshift bin based on it’s 
own estimate of the mean PDF instead of that of SKYNET 
as was done in Sec. 5. Figs. 17, 18, and 19 show the perfor¬ 
mance of the four methods. Table 6 shows the offsets of the 
mean of the redshift estimated distributions with respect to 
the weighted spectroscopic distribution. There is not a clear 



z range 

annz2 

BPZ 

SKYNET 

TPZ 


0.30 - 

0.55 

0.017 

-0.005 

0.003 

0.004 

Test 1 

0.55 - 

0.83 

0.018 

0.01 

0.017 

0.016 


0.83 - 

1.30 

0.032 

0.077 

0.063 

0.050 


0.30 - 

0.55 

0.049 

0.002 

0.027 

-0.013 

Test 2a 

0.55 - 

0.83 

0.015 

-0.025 

0.034 

0.031 


0.83 - 

1.30 

0.086 

0.046 

0.044 

0.069 


0.30 - 

0.55 

0.015 

-0.015 

0.012 

-0.020 

Test 2b 

0.55 - 

0.83 

0.011 

-0.027 

0.013 

0.008 


0.83 - 

1.30 

0.025 

0.007 

0.022 

0.028 


Table 6. The bias (( z p hot) ~ (zspec)) between the photometric 
redshift estimates and the true spectroscopic distribution in Test 
1 (independent), Test 2a (VVDS-Deep), and Test 2b (Full vali¬ 
dation set) when the codes each assign their own binning to the 
galaxies. 

benefit to enforcing separate tomographic binning based on 
each photo-z method and repeating the analysis pipelines in 
the companion papers for DES SV, as some methods per¬ 
form better and others worse when using the fiducial SKYNET 
binning. 

C PHOTO-Z METHODS 
C.l ANNZ2 

ANNZ2 (Sadeli et al. 2015) 4 is an updated version of the 
neural network code ANNz (Collister & Lahav 2004). ANNZ2 
differs from its previous version by incorporating several ad¬ 
ditional machine learning methods beyond Artificial Neural 
Networks (ANNs), such as Boosted Decision Trees (BDTs) 
and fc-Nearest Neighbours (KNN) algorithms. These are im¬ 
plemented in the TMVA package (Hoecker et al. 2007) 5 . 

For the 100 ANNs run on the spectroscopic training 
set, we randomly varied: the number of nodes in each layer, 
the number of training cycles, the usage of the so-called 
Bayesian regulator, that reduces the risk of over-training, 
the type of activation function, the type of variable trans¬ 
formation performed before training (such as normalisation 
and PGA transformation), the number of subsequent con¬ 
vergence tests which have to fail to consider the training 
complete, and the initial random seed. After training is com¬ 
plete, the performance of each method is quantified through 
an optimisation process, which leads to a single nominal 
photo -2 estimator for ANNZ2. The entire collection of so¬ 
lutions is used in order to derive a p(z), constructed in 
two steps. First, each solution is folded with an error dis¬ 
tribution, which is derived using the KNN error estimation 
method of Oyaizu et al. 2008. The ensemble of solutions is 
then combined using an optimised weighting scheme. This 
methodology allows us to take into account both the in¬ 
trinsic errors on the input parameters for a given method, 
and the uncertainty on the method itself. The methodology 
described above is what is called ’’randomised regression”. 
Another important feature implemented in ANNz2 is the 

4 https://github.com/iftachSadeh/ANNZ 

5 TMVA is a part of the ROOT C++ software framework (Brun 
&; Rademakers (1997)) 
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Figure IT. The weighted spectroscopic redshift distribution n(z) (shaded area) compared to the estimates of the four codes for the 
Test 1 (VVDS-F14) galaxies. Unlike in Sec. 5, all codes assign galaxies to tomographic bins according to their own mean PDF estimates, 
hence the objects in each bin differ for each panel in the plot. 
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Figure 18. The weighted spectroscopic redshift distribution n(z) (shaded area) compared to the estimates of the four codes in Test 2a 
(VVDS-Deep galaxies in the validation set). Unlike in Sec. 5, all codes assign galaxies to tomographic bins according to their own mean 
PDF estimates, hence the objects in each bin differ for each panel in the plot. 
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Figure 19. The weighted spectroscopic redshift distribution n(z) (shaded area) compared to the estimates of the four codes in Test 2b 
(Full validation set). Unlike in Sec. 5, all codes assign galaxies to tomographic bins according to their own mean PDF estimates, hence 
the objects in each bin differ for each panel in the plot. 
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weighting method (Lima et al. 2008). It is therefore pos¬ 
sible to give in input a reference sample and re-weight the 
training set to make its relevant variables distributions more 
representative of the former, this was technique was applied 
in this work. 


C.2 BPZ 

The bpz (Bayesian Photometric Redshifts) photo-z code 
(Benitez 2000; Coe et al. 2006) is a model fitting code that 
fits galaxy templates to the measured photometry and its 
associated errors, bpz calculates the likelihood of the galaxy 
for the best fitting template, which then, using Bayes theo¬ 
rem, is combined with a prior to produce the likelihood. The 
prior represents our previous knowledge of the redshift and 
spectral type distributions of the sample in the analysis. 

• Templates: We use the eight spectral templates that 
bpz carries by default based on Coleman et al. (1980); Kin¬ 
ney et al. (1996), and add two more interpolated templates 
between each pair of them by setting the input parameter 
INTERP=8 (option by default). 

• Prior: We explicitly calibrate the prior in each test 
by fitting the empirical function Il(z,t | m o) proposed in 
Benitez (2000) to the weighted training set, although we 
note that using the weighted or unweighted training set to 
get the prior had a negligible effect on photo-z performance. 


C.3 SkyNet 

SKYNET (Graff & Feroz 2013) is a neural network algorithm 
that uses a 2nd order method based on a conjugate gradi¬ 
ent algorithm to find the optimal weights of the network. 
SKYNET classifies galaxies in classes, in this case redshift 
bins, where the last layer is a softmax transformation that 
is able to estimate the probability that an object belongs to a 
certain class (or bin) (Bonnett 2015). The number of classes 
is the redshift bin resolution of the pdf. In this work SKYNET 
is run slightly different than in Sanchez et al. (2014); Bon¬ 
nett (2015). SKYNET is run 10 times with the same network 
configuration but with a slightly shifted binning each time. 
We train with a nominal bin width of Az = 0.09 - these 
are referred to as the broad bins. The broad bins are then 
slightly shifted by 8 = 0.009 every training run so that Az 
is sampled in 10 locations, leading to a overall sampling of 
8z = 0.009. This produces 200 bins between z = 0.005 and 
z = 1.8. After the 10 networks have been trained, the pdf 
values at Zi are taken to be the average of all the broad bins 
that Zi lies within. This means that the SKYNET photomet¬ 
ric redshifts have an intrinsic smoothing built into them. All 
the networks have the same architecture, 3 layers with 16, 
14, and 20 nodes per layer and a tanh activation function. 
The features fed to the network are the MAG_AUTO i, r and 
all possible colour combinations of the four bands. In this 
work we make use of the python wrapper pySkyNet 6 of 
the SKYNET library. 


6 http://pyskynet.readthedocs.org/ 


C.4 TPZ 

tpz 7 (Carrasco Kind & Brunner 2013) is a machine learn¬ 
ing algorithm that uses prediction trees and random forest 
techniques to produce robust photometric redshift PDFs. 
Prediction trees are built by asking a series of questions 
that recursively split the input data taken from the spec¬ 
troscopic sample into two branches, until a terminal leaf is 
created that meets the stopping criterion. The method by 
which the data are divided is chosen to be the one with 
highest information gain among the random subsample of 
features chosen at every point. This produces less correlated 
trees that act as weak learners that can be combined into 
a strong predictor. All objects in a terminal leaf node rep¬ 
resents a specific subsample of the entire data with simi¬ 
lar properties. Additional data is created before the trees 
are constructed by perturbing the data using their magni¬ 
tude errors - this is sometimes referred to as a parametric 
bootstrap. In this work 200 trees were created whose results 
were aggregated to construct each individual PDF. For the 
application to DES SV data, we have used griz MAG_AUTO 
magnitudes together with all the corresponding colours and 
their associated errors. We discretised the redshift space into 
100 bins up to z = 1.8 and adopted a smoothing scale of 5 
times the bin size. 


D PRIMUS, AN EXAMPLE OF EXTREME 

SELECTION EFFECTS. 

In building the spectroscopic training and validation sam¬ 
ples, we have excluded any galaxies from the PRIMUS sur¬ 
vey (Cool et al. 2013). Here we will discuss some of the 
complications of using PRIMUS galaxies as part of the train¬ 
ing or validation samples. PRIMUS is a spectroscopic sur¬ 
vey covering a total of 9.1 deg 2 containing 185,105 galaxies, 
of which we have matched 88,040 galaxies that have DES 
SV photometry within 1.5 arcseconds only using the two 
highest PRIMUS quality flags 4 and 3. The PRIMUS red¬ 
shifts are obtained by fitting low resolution spectra and any 
matched photometry to an empirical library of spectra based 
on the AGES spectra (Kochanek et al. 2012). The PRIMUS 
redshifts have two peculiarities, the first being that a non- 
negligible amount of galaxies have a different redshift when 
compared to objects with spectra from higher resolution in¬ 
struments. Cool et al. (2013) estimate cr<s z /(i+ z ) = 0.005 and 
0.022 for quality flags 4 and 3, while we find 0.004 and 0.010 
for all the matched objects within the DES survey. The top 
two panels of Fig. 20 show this comparison of the PRIMUS 
spectroscopic redshifts with matched spectroscopic redshifts 
from higher resolution instruments. This leads us to con¬ 
sider the unresolved question of how robust ML and other 
calibration methods are to incorrect spectra, which is not a 
question that we attempt to answer in this work, but one 
for which there has been some work in general in the ML 
literature ( e.g ., Nettleton et al. (2010); Cunha et al. (2014)). 

The second PRIMUS feature that is important for pho¬ 
tometric redshift estimation is the fact that galaxies are only 
fit up to z = 1.2. The cut at z = 1.2 is effectively a selection 
effect and hence, one must take care when using PRIMUS 

7 http://lcdm.astro.illinois.edu/research/TPZ.html 
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Figure 20. An analysis of challenges related to the use of 
PRIMUS spectroscopic redshifts as part of the DES SV train¬ 
ing or validation samples. Top panel: PRIMUS redshift vs the 
matched spectroscopic redshift from higher resolution instru¬ 
ments. The blue dots are the highest quality flag 4, while the 
red dots are the second highest quality flag 3. Second panel: The 
fractional difference of the redshifts between PRIMUS and the 
other surveys. Third panel: The spectroscopic redshift distribu¬ 
tion of PRIMUS galaxies between 1.0 < £ < 1.3. Around ~ 1.2, 
there is a large drop in the spectroscopic redshift distribution due 
to fact the galaxies have a maximum fitting redshift of z = 1 . 2 , 
while AGN are fit up to z = 5.0. Bottom panel: The effect of this 
drop in PRIMUS n(z) on the final estimation of the n(z) for the 
DES SV shear catalogue. Shown are two examples of including or 
excluding the PRIMUS galaxies using SKYNET, where the feature 
at 2 : = 1.2 is clearly imprinted on the n(z) of the weak lensing 
sample when PRIMUS galaxies are included in the training. 


to train. To illustrate this, consider a galaxy at 2 = 1.2 ob¬ 
served by PRIMUS and DES for which we want to estimate 
the p(z). In the idealised case of a Gaussian pdf, the mean 
would be located around z = 1.2 and there would be tails in 
the p(z) extending to lower and higher redshift. Given that 
there are no galaxies beyond z = 1.2 in PRIMUS, none of 
the ML methods will be able to learn that some probability 
should extend beyond z = 1.2. Even when assessing how 
well a template fitting method performs, the lack of spectra 
beyond z = 1.2 may lead one to believe the performance is 
poor. These features are demonstrated in the bottom two 
panels of Fig. 20. In the bottom panel of Fig. 20, we provide 
a real example of the difference on the reconstructed n(z) for 
the weak lensing sample around z — 1.2 when trained with 
and without PRIMUS. Though this is an extreme case of a 
selection effect imprinting itself on the reconstructed n(z ), 


it is possible that similar, more subtle effects persist in the 
ML photometric redshift estimates. There are a large num¬ 
ber of PRIMUS spectra, however, and careful efforts should 
be made to find ways to utilise these in the future. 
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